1 



SD 111-8*4, 

IDTHOB 
TITLE 



IHSTIIDTIOH 

HEPOBT ho" 
POB DATE 
HOTE 



EJ}BS PBICE 
D^SCEIPTOfiS 



IDEKTIFIEBS 



ABSTBACT 



TH 00tj/81l' 



Fesel, Eonai^ ^ \ 

LsnSlfi'v. J'^^^Kf^^. ^^a-s HasterZ-Learning 
Sdei? '^^th^Kriewall»s Criterion-Beflrenced ?Lt 

ll^earcf .nf^i'^^H ^^<>"torj for Educa/ional 
sm"S£5'??.Sr''°''^"'* Ala.itosJcalif. 
21 Apr 71 ' 

I7p.; ?or -related docuaent'^, see TH OQij 812 and 

MF^$0;76 HC-$1.58 Plus Postage ■ 
Baaresian Statistics; *Co«patJ^i.ve Analkis- 



i^tiewiaxrs Criterion Referenced Test he 
le.«.ng Test Model (E.rici ani'lL^sIl, 



Both Of \Je tesf ^odelf kslSssef °' learning d 

ob1ectiw«:.c>. aiscussed here assess the achi 



del; Mastery 



nc6d Test 
of Bmrick and 
iiext of 

perforaance 
Ification of 
eticiencies, 
eVeftent of 

hdcfa they are 
s tfpr making 
o De useful in 
each model for 
(Author/iDEP) 



* .aterials not available from St.h!? =^f "^"^ Inforrel unpablishea » 

* ta obtain the best copy a»ailah?r =°»'^'=«- E^C makes every effort' » 

* reprodaciiilitv are ofLJ ?• "f ^rtheless, ite.s of marginal * 

' of^h, .ic-^L\e"lnf h1?d| =r?e';?o1„l?Ln\"lBl11fJf ' * 

■ by EDES a?e ?he"L? tJat rli^t"^^ docuient. Reproductions » 




SOUTHWEST REGia\AL LABCHATOKY 
TECHNICAL .MEMORANDUM ■ 



DATE: April 21^ 1971 



NO: 



TM 5-71-04 



wiJ^K^X?^ r^l';^ AND.ADAM'S MASTERY-tEARNING-TEST MODEL 
WITH .KRIEWALL'S CRITERION-REFERENCED >XEST MODEL 
AUTHOR: Ronald BeseJ 



• ; ■. - • ABSTRACT 

,The assumptions of the.Criterion-kef erenced Test (CRT) mod^el proposed 
by Kriewall are compared to those of ' Emricl, and Adam' s Mastery-Learning 
(ML) model. The applicability of each model for instructional management 
decisions is discussed. . " 



us OE^AHTMCnT OF HEALTH. 
, EDUCATION ICWELFAKE 
NATIONALINSTtTUTE OF 
EDUCATION 

THIS OOCUAAENT HAS SEEN RpPRO. 
OUCED EXACtLY AS RECEIVED FROM 
THE PERSON ORGANrZATION ORltjIN 
ATING IT POINTS OP VIEW OR bPlXtONS 
STATEaOONOT NECESSARILY REPRE 
SENT OFFICIAL NATIONAL INSTITUTE OF 
EOUCATld/^ POSITION 0R^0l»CV 



-PERMJSSJON TO REPROtX>C£ THIS "bOPY- 
WGHTEO MATERIAL HAS BEEN GRANTED BY 



TO ERIC AND ORGANIZATIONS OPERATING 
UNDER AGREEMENTS WITH THE NATIONAL IN- 
STITUTE Of EDUCATION FURTHER REPpO- 
DUCTION OUTSIDE THE ERiC SYSTEM RE- 
QUIRES . PERMISSION OF THE COPTRIOHT « 
OWNER • 



JIMS JiHuhrni IS iciendrd (or internal >t.iff '<l5trib.it ion inH us* i-*r«i*.. » ' 



1 *. Wiif 1 fit- 



' ill f'>rrtt^, 



THE KASTERY-LEARNIKG TEST MODEL- 
COMPAJasONlWITH lOUEtfALL'S CRT- KODBL 



Numerous psychometric models have been used for interpreting testing 
. data (lord. and Novick. 1968); most of these models are appropriate for 
. norm-referenced tests (NRT). Kriewail (1969) criticized the use of NRT 
models for interpreting criterion test data. He proposed a model for 
criterion t^sts which he called the CRT model. In the same critical 
spirit, Emrick'and Adams (1970) proposed a Bayesian mastery- learning 
(ML) model. 

-Testing, in the context of instructional management, serves three 
genti^l purposes: performance evaluation (achievtoent of objectives). 
^ placement (classification of students for lnstru^)7 and diagnosis, 
of learning deficiencies. - Both of the" test models discussed here assess 
...the achievement of objectives; they differ in the typea of objectives 
for which they arg best suited. Both test models have potfential useful- 
ness for making placement decisions but only the ML model is likely to ' 
be useful In diagnfosing learning deficiencies. ■ r 

Psychometric Assump ttonn ' , 

The ML i^odel assumes that a test" measures a single skill and -that 

there are only two true states of proficiency with^ respect to that skill.; 
. Each individual tested is in either the mastery Of)" or non^mastery (M) 

stateat the time of testing. The CRT model assumes, likewise, that a 

test is a measure of a single skill - - defineUy a specified conteiit 
/objective (SCO), i.e., a rule or procedure for generating a clas^ of 

problems - - but, proficiency is assumed to be a continuum between" 

masjiery and lion-msstery. '"''.<■ 



Both the CRT and the HL models assume that an individual's responses 
to the separate items on a test can be treated as a sequence of indepen- 
dent Bernoulli trials/ each having the same probability of '[success. The 
consequences of this assumption are: 

(a) ThWrobability that a given " individual -will give a correct . 
■response, to any item from the test (or from the domain from 
which the itenis were selected) is the same for .all items; 

(b) No. learning occurs during the time of test administration; 
, . (c) The outcome of 'any trial (item response) is independent of 

the outcomes of every other trial. 

Let, 

""ia "P"8ent the response of individual a to item i,( trial i) 



- , observed score (number of correct responses) for individuar a 



1 if the correct respdnse is given ny 
0 if an incorrect xesponse. Is given 



= ^a it -2a + ; ' • + "^na ' " (2) 

The subscript "a" will be deleted wh^nTh^«fer^f^an observed item 
response or test score is not a particular individual. Sinee both models 
assume that test performance can be represented mathematically as a 
sequence of independent Bernoulli trials, each hypothesizes that i'f an 
individual is repeatedly given parallel tests, his score distribution ■ 
will be biaomial with the probability of a correct item response (p ) as 
the distribution parameter/ For a n-item test, the score distribution 
function is: 

^■<V Pa U-P,) "-h ■ <3, 



^ O 



where, 

* 



The CRT and ML models differ in their interpretat^d^ of the parameter 
Pg. The CRT model assumes that p^ is equivalent to a "true score" as in' 
typical NRT models. For any individual, "true proficiency" is estimated 
from observed score. 

'a ' fa (5) 

n. 

The ML model assumes that p^ is a single constant value (1-P) for all 
individuals in the mastery (M) state and a constant a for all individuals 
in the non-mastery (M) state: 

.a = Th6 probability that an individual in the M state will give a 

correct item response. . 
P = The probability that an individual in. the M state will give an 
incorrecr^tem response. 
Both or and are assumed to have true valu^ which' are characteristic * 
of the test. They must 'be estimated from the -responses of- some reference 
group of individuals. Two conditional distributions can represent the 
expected score distributions for all individuals when the ML model is 
employed: . 

P(x/M)'.(j)(i.3)X 3"-^ \ • 

P(x/i) (l-a)"-x \ 

The CRT model characterizes each individual tested by his estimated 
"true prbficiency" using equation (5). The ML. model, on the other hand, 



4 



characterizes each itidividual by his estimated probability of being in 

the mastery state. This probability, can be computed using Bayes formula: " 

'p(M/X) = PR(M) • P(y/M^ • 
: P(X) • . (8) 

• where PR(M) equals the prior probability that the individual is in the . ^ 

mastery state and, 

P(X) » pk(M) . P(X/M) + PR(M) " P(x/M) (9) " . 

Where, PR(M). = l-PR(M) " 

The ML model requires valid procedures for estimating prior probabilit:fes. 
Procedtires applicable to' SWRL instructional programs will be discussed in 
a later section. ^ >' 

Kriewall assumes that rigorous item-sampling procedures will be 
followed to construct parallel criterion-referenced tests! Emrick and" 
Adams do not specify a test construction proced«re.,for a ML test;' , - > 

Kriewall 's method would be applicable but the ML model may also be ' 
valid for criterion tests constructed using less rigorous procedures. 
The reference group used to estimate the or and P parameters could also 
be employed to test the equal item-difficulty assumption. inherent in 
both models.. If this assumption is found to be empirically untennable,- 
the test model or the SCO domain may require modification. 

Measurement errors are interpreted differently by t4 two test 
models. Kriewall assumes that, all measurement errors are \«jdom with 
an expected value equal to zero. The observed score can then be 
interpreted as an unbiased estimate of the percen'tage of the items "in 
the content domain for which the individual knows the correct response. 
The' interpretation of a test which is Biased .(e.g. , a multiple-choice 



test) is not discussed by Krieval^ the CRT test nodel has an intuitively 
appealing intrepretation only for, ur&'iased teats. The a and 3 parameters 
of the ML model represent two types of bias errors. Both constructed- 
response and selected-response tests can be .interpreted readily when the 
ML model is .enjBrloyed. The or value for a selected-response test is likely 
to }fe Significantly Urger than the a value for a comparable constructed- 
rfesponse test. Selected-responses' tests, however, may achieve smaller 3 
parameters . 



Prior Probabilities 



A Bayesian model is, in general, more efficient than one employing 
classical statistics to the extent that prior probabilities can be 
precisely estimated.'. Two general classes q| prior probability estimates 
can be used by the ML model. The first class, includes estimates of the 
proportion of an appropriate reference group which is in themastery 
state. In interpreting an individual's observed score, he is treated 
as a random sample fr'oihx.the reference group. 

The second class of .prior probability estimates includes only, 
methods which use other past orXpresent numerical information relevant " 
to. an individual. These will be ckled personalized prior' probabilities 
and the subscript (a) will be added to the symbol for a 'prior probability. 
PRg(M) = personalized prior probability for individual (a). 
For criterion-referenced testa three types of pupil performance 
data^seem to be most relevant to estimating personalized prior 
probabilities. The first is the P(M/X) value for a similar objective . ■ 
for which assessment was mads in the'recent past (e.g., skill in reading 



the votdB on a ^ent vocabulary list should U a.potential value for 
ES^di) for the vocabula^. iist of the next test^ictional unit), if a 
hierarchial relationship betveen objectives exist, anothe^ reasonable 

^ estimate vf a prior probability would "be the probability of mastery of • 
the objective directly ^HfroTtE^-one* in question.' ikis approach may 

use ei«aer current .or J^t test data depending on the testing schedule, 

A third method employs the scores on a pretest; preferably a parallel 

version of the posttest is used. 

Instructional Decistonn 

• Either the estLted proficiency (p^ of the CRT/model or the 
probability of mastery. P(M/X). of the ML ^del can be used as a decision 
variable- when classifying pupils for instructional purposes. In theory., 
the decision variable- can be used to classify pupils, into m subgroups " 
where m can take^any integer value which' does not exceed the number of 
discrete levels assumed by' ttie decision variable. In a. school setting 
it is not likely that, class if ^^cat ion into more than three groups will 
be practical. Kriewall treats only, the case of classification into two ' 
groups - masters and non-masters. • - ' 

For the CRT model, the foLlowiL steps are taken in selecting the 
test length (n) and t;he acceptable I'paisin^" scpre (c) for the two-group ' 
classification. ■ ' ' '. • . , 

1. A- minimal acceptable prof ici^ (criterion level) is selected. 
The nominal student is defined to be one whose proficiency ^ 
equals this criterion level (p^)i 



2.. The acceptable pr.obability of a Type I error (classifying a 

nominal student as a non-mastesy) is" chosen. 
3. A proficiency level, p^. less than p^ is selected to represent 
t;he "nominal non-master." 
. 4. The acceptable probability of a Type II. error (classifying a 
■ nominal iion-master as a master) is chosen. 
5. Values of n and c which satisfy the Types I -and II error limits 

for the chosen p^ and p^ proficiencies are solved for iteratively. 
For the ML model, the "passing score." c. for the two-group, classifi- 
cation problem can be computed using a simple expected loss model. 
Let, 

z - ' 

L(c) = loss for a parsing score of X = c 

Type 1 error: classifying a master as a non-master (false fail) 
Type n error: 'classifying a n6n.ma.ster as a master (false pass) 
= cost of making a Type 1 error 
= cost of v^ing a Type 11 error 

' 1 ■ ' ' 

L (c) > = expected loss 
The expected loss for any selected value c is- 

' ^ •^"^''^ • ^^""^ h 2I'P(M/X) - P(X)" ■ (11) 



E 



X=c 



X=0 

Increasing c by 1 will result in deleting one term from the second 
summation in equation (11) and adding one .term to the .first. The 
expected loss is minimised by including in the' first summation only 
those terms for which Lj-P(M/X) is less than L2.P(i/X). Equation (11) 
assumes that a single "passing score" must be selected for classifying 



1 fr- 



each individual- in the group..'." This restriction need only be mde if ; 
- eac-h individual „in the group is assumed to haye the same prior probability 
of being in the mastery state, if personalized prior probabilities can ' 
be estimated, the computed P(M/X)-for each individual c^n be used directly, 
as a decision variable.'. The'pCM/X) value which yields the minimum expected 
loss is then: 



' P(M/X) - L2 • P(M/X) . 



(12), 



P(M/X) = 1 

■ . h + L2 •. (13) 

This probability of mastery value is a criterion level in the same sense 
that "80 percent correct answers" is used as a criterion level, if the 
computed P(M/X) 'exceeds this criterion level, the individual is classi- 
fied as' a master; otherwise he Is classif i/d.^as a non-master. The ratio, 
L^/L^, will be referred to as the loss ratio. The effect of prior ' 
probability on the selection of the optimal c value is illustrated by 
Figure 1. For a loss ratio equal toS, c should be set equal to 5 for 
prior probabilities between .15 and .59; c .should equal 4 for prior 
probabilities between .59 and and equal. to 3 for prior probabilities 
greater than .92. Thus, if personalized prior probabilities ar^ employed, 
the test will not have a fixed passing score. 



Test Leng th and Sequential -Tefit^n p ; * 

Figure 2 illustrates the ML model for a one^item test. The two 
sets of or, p parameters chosen are representative of. typical selected 
response and constructed response tests. 



10 



The -mathematical inteVetation of the ML model permits a simplified 
computation of P(M/X) when oneN^r more items are added sequentially to a 
n-item test, if a single item' is added to .a n-item test, Figure 2 can 
be used to obtain a revised ..or posterior value for P(M/X);. the P(M/X) for 
a n-item test is used as the prior probability value for a (n + 1) -item 
test . 

^ 'Example 1 

(a) Prior probability assumed, to be ,6, 
• (b) Student gave 4 correct responses on a 5-item test, 
■ (c) a = ,5, P= ,1 • 

^^^f^^f^^'^Sa^f^^a^cop^t res^se on a sixth item. 
From Flg^e 1, P(^/x) for the 5-item test is ,76; using this value as a 
prl^probability, P(M/x) foi^ the 6-item test from Figure 2 is ,85, ' ' 

The effect of doubling the test length can be estimated from 
Figure 1 in a similar ntenner.^ 

Example 2 \ 

(a) "Prior probability assumed to te ,5, 

(b) Student gave 8 correct responses on a ten item test 

(c) a = .5, P= .1 ■ 

Each combination of 5-item test responses which result in 8 correct 
responses on a ten item test are tabulated, 

* 

Initial 5-item8 Final 5-items Combined 

XP(M/X) ■ X : ,P(M/X)^ • 



3 ,190 5 



.82 



4 ... ,677 ' \\' Q2 



5 .950 



.82 



11 



• 10 



i 



.Note .that the P(M/X) for the cotibined 10-iten. teat is the same for each 
possible combination of ^correct responses on initial and fioal tests. 

•Mastery Tests ^» <^T^^^ erion RefprP^T'ced Tests 

Mastery tests may be viewed as a special case.of criterion referenced 
tests; 

(a) The criterion level is a perfect scor>^. 
: (b) The only '^rue scores are assumed to be 0 an<J n (for a .n-iten 

test); all intennediate scores are due to^^asureoent error. 
It. is the second characteristic (i) which pennits the applidtio; of a 
simple, decision rule to determine the '^assin^g" score. 

The ML model is designed specifically for mastery tests; the CRT 
model itf appropriate for situations .where it is ifeaaingful to speak of 
"degree of perf omance . " For example, the CRT ^nodel may-be used to 
estimate the percentage of wotds in a lengthy vocabulary list that a 
student- can read. The CRT mpdel would seem to be most appropriate for 
evaluating performance when the content domain of the objective Is w 
• large as to .require an item-sampling procedure. The ML model can be 
used for Single-it^ tests; d<,nger tests are conceptua-lized as repli- 
cations of singleMtei, tests. The ML model is most appropriate for 
•narrowly defined behavioral objective^ for which performance can be 
'conceptualized as "all or nothing." ' ' 

To be useful as'. a diagnostic tooA, 'a test must bre^ down perfonr^ance 
into separate skiUs for which prescidf t%t^treatments are available and . 
effective. The ML model is we-ll suited ^to Wasuring^ills at a level " 
of specificity which is desirable. for remedial' ^-struction. , The ML 



tit:, ^e eppiiei to naliLpU choice tests vfaich can be ccn^truct^ 
'Z'-.zzxxz. ti^'/the-distrectoT =e6{>OT£e6 represent n^aniiigful types 

r-.e "^L iicc^l de£igr.£^ to t-e used In caking placeaiiit or 
..tii^f .c^f.cc tsc:.xi=^6 froa dts^ostU. iaf oriation. iBdividually 
:£;.CTe: cri^l £^ ?r£crlc€ exercises, tutorials, or saall group Instruc- 

£:re trj-ei :f treatn^ts vhich W be prescribed fro= ffiasterj- learn ibg " 
i£XT.rE:£. :rT noiei is appropriate "for inaking placement declsioks 

i.£-t; :r cfc?re* ;rcficie-o- rather tnaJc diagnosis of- learning defi- 
.» -_i£. :: :ri: :c :e-c:.- ilsv bf created as a measure of aptitude 

:r :.:.re .€£r-:-^.^ ;: £pr:r^_ee is vle^je^l as the ai^ount of tioe required ' 

:^£:rrer :c atta-.r. ffiasrerv of <^4lrning . task ^Carroll,- 1963) 
Uc^ifert', cecisiotvs base^ or,; aptipudes^'iLay impro-ze the efficiency of > ^ 
^j:r-_c:i-r. 7-ie foriLatior; of 'instructional groyps "for initial 
-j:r_i::— :e an exacple jf this type of placeiaent decision. 

ScirpsriE.^ of tne rvo test models lea<fs to the following conclusions: 
rr-e KL m>deJ is applicable to very short tests — S-items or • 
-fe-«>£r ", a^d.-is appropriate for instructional decisions related 
zo B^ecific behavioral objectives-: • • 

2. C?.I noi^l IS suited for longer tests - approximately 20 or 

nore itena -- u:;less testing can be done sequentially by item, 
rce CFTEodel may be better^than the ML model for more general 
bfrhevioral c^bject^ves (e.g., an extensive content d'omain). 
-^^^A^ Erpdel oay be used as -the basis f6r forming instructional 
rrc'-ps for a group- oriented Eode of instruction; mastery tests 



13 



REFEEIENCES 



Carroll, j A . "A Model of School Learning," Teachers ^ollef >e Rp.nr 
1963, (Vol*. 64), 723-733. . ^ 



Emrick, J.A., & Adams, E.N. ."An Evaluation Model for Individualized-- 
Instruction,. Paper ^presented at the Annual Meeting of ttie ' 
American Educational Research Association, Minneapolis 
Minnesota, 1970. , y a, 

Guttman, L. "Integration of Test Design, and Analysis,, "• in the' Proc^ 



■ 05 the 1969 Invitational Conference on Testing Probl p;;;^" 
Educational Testing Service, 1970, 53-65. 



Kriewall, T.E. "Applications of Information Theory and Acceptance 
Sampling Principles to the Management' of Jtethematics 
Instruction," Technical Report No. i:D3, Wisconsin Research 
. , and Development Center for Cognitive Learning, Madison 
Wisconsin, 1969. . * 

Statistical Th eories of Mpnfal Test Scores. 
Addison-Wesley, 1968^ ^ . 




